Development of a Tool to Measure Student Perceptions of Equity and Inclusion in Medical Schools

This cohort study developed and validated an instrument to measure students’ perceptions of the climate of equity and inclusion in medical schools.


Introduction
Creating an inclusive and equitable learning environment has been a longstanding priority for national medical organizations including the Association of American Medical Colleges (AAMC), 1 the American Medical Association, 2 the National Medical Association, 3 and the National Academies of Sciences. 4Yet, despite a national sense of urgency, data reflecting medical students' perception of the climate of equity and inclusion are limited.
Our limited understanding of the climate of equity and inclusion creates several problems that have direct implications for the health care workforce.First, without clear, concise, and recurrent data, we cannot precisely quantify disparities in equity and inclusion that exist among medical students from different backgrounds.Second, we are neither able to gauge the immediate impact of these disparities on student well-being and achievement nor downstream impacts on the diversity of the physician workforce.Third, we lack the ability to develop evidence-based interventions to address disparities related to equity and inclusion in the learning environment.
1][12][13][14] Consequently, we developed a new tool to measure students' perceptions of the climate of equity and inclusion in medical school using data collected annually by the AAMC.In this article, we report on the development and psychometric validation of the tool as a sustainable, scalable, and generalizable measure of equity and inclusion in doctor of medicine (MD)-granting institutions.

Methods
The development and validation of the tool, Promoting Diversity, Group Inclusion and Equity (PRODIGIE), proceeded through several stages.First, a Delphi panel was convened to identify survey items reflecting students' perceptions of the climate of equity and inclusion in the medical learning environment from preexisting AAMC data sources.Next, 5 years of student responses to these items were obtained from the AAMC and used for analysis.Exploratory factor analysis (EFA) was performed on item responses to construct the tool.The tool then underwent psychometric validation.The study followed the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) reporting guideline.This study was approved by the Yale University institutional review board.Informed consent was not required because the analysis was conducted on a deidentified dataset.

Tool Development Delphi Panel Members
The Delphi panel included a team of 9 experts in diversity, equity, inclusion, medical education, and psychometrics that was diverse in terms of race, ethnicity, sex, career stage, and professional

Defining Equity and Inclusion
Before Delphi rounds began, Delphi panel members met to reflect upon and determine definitions for equity and inclusion that would be used throughout the Delphi process.Equity was defined as ensuring that everyone has access to the same opportunities.Inclusion was defined as all individuals feeling respected, supported, and valued.

Tool Item Selection
Panel members were given copies of all the AAMC medical student survey instruments, which included the Year 2 Questionnaire (Y2Q) and the Graduation Questionnaire (GQ).They were also given copies of the American Medical College Application Service and Electronic Residency Application Service instruments.Each year, survey and application data are collected by the AAMC and represent a comprehensive reflection of students' self-reported demographic data and lived experiences during medical school.
Item selection for the tool was performed over the course of 3 Delphi panel rounds.During each round, panelists reviewed documents and provided recommendations on survey items for inclusion before meeting as a group.Subsequently, panelists met to discuss the survey questions identified.
After each round, a summary document of results from the prior round was created and distributed to panelists.This process had several benefits.Allowing experts to provide input before meeting afforded panel members anonymity when responding and reduced limitations commonly inherent to group interactions, including the influences of dominant personalities and the pressure to conform. 15ntrolled quantitative and qualitative feedback between rounds allowed each expert to generate additional insights and the ability to clarify information from previous rounds.Meetings permitted the benefits of face-to-face interaction to exchange information and resolve uncertainties.
During the third round of the Delphi panel, members focused on resolving any conflicts regarding survey items for inclusion in the tool.Survey items for which consensus was reached were included.
As is common in the Delphi process, consensus was defined as when 75% of panelists were in agreement. 16

Final Survey Item List
The Delphi panel identified 146 survey items.Survey items were chosen from the AAMC's Y2Q (80 items) and GQ (66 items).

Study Population and Data Sources
Student responses were obtained for all survey items identified by the Delphi Panel.Student responses came from the 2015 to 2019 administrations of the AAMC Y2Q and the administrations of 2016 to 2020 AAMC GQ.Demographic data for student self-reported race, ethnicity, sex, sexual orientation, and socioeconomic data were abstracted from the AAMC's data warehouse and the AAMC's Applicant Matriculant Data File.Participants selected race and ethnicity from the following categories: Asian; American Indian, Alaska Native, Native Hawaiian, or Other Pacific Islander; Black or African American; Hispanic White; multiracial; and unknown/other.All students who reported Hispanic ethnicity were categorized as Hispanic regardless of race.Students were considered to come from low-income backgrounds if they indicated that they received Pell grants and/or state or federal financial assistance.Consistent with prior literature, marginalized identity was defined as a sociodemographic identity known to have been historically minoritized or discriminated against in medicine (female sex; non-White race; Hispanic ethnicity; lesbian, bisexual, or gay sexual orientation; and low-income status). 10,17,18][13][14]18 These groups were chosen because studies with intersectional analyses of medical trainees' experiences have shown that, while other demographic groups are exposed to bias and discrimination, individuals who identify as underrepresented in medicine, lesbian, gay, bisexual, or who have multiple marginalized identities often report the highest rates of bias, mistreatment, and discrimination. 10,17,18,20To generate the data sets for EFA and CFA, we used the random selection function in SPSS version 26 (IBM).The sample analyzed from the Y2Q totaled 30 571 (EFA: 14 544; CFA: 16 207) and that from the GQ totaled 36 266 (EFA: 18 307; CFA: 17 959).

Exploratory Factor Analysis
Exploratory factor analyses 21,22 were conducted on the GQ and Y2Q data sets separately to determine the extent to which the survey items represented underlying factors.We performed exploratory factor analysis for the year 2 and graduation time points separately because of the distinct preclinical and clinical environments that are often demarcated at the second year of medical education.Because we hypothesized that all or some of the latent factors were correlated, we used the common factors method of analysis using an oblique (PROMAX) rotation of the correlation matrices. 22ginning with data from the GQ, data from approximately one-half of the sample were randomly selected and used to conduct the exploratory factor analysis.The Kaiser-Meyer-Olkin measure of sampling adequacy and the Bartlett test of sphericity were calculated to assess the appropriateness of the data for factor analysis.Items were retained if they had a factor loading of 0.4 or higher and if they loaded on only 1 factor.A parallel process was used to conduct an EFA on the Y2Q survey data.

Psychometric Validation
Confirmatory Factor Analysis | CFA was used to test the models identified through EFA procedures with the remaining approximate one-half of the samples from the GQ and Y2Q.4][25]  whether people give similar responses to items meant to measure the same factor. 27Internal consistency was evaluated via Cronbach α, calculated for each factor.
Criterion Validity | Criterion validity evaluates whether an instrument produces scores that correlate with outcomes in a manner consistent with theory. 28To examine the criterion validity of the tool, we conducted 2 tests.
First, we calculated mean tool scores for each medical school by only including responses from students who reported no marginalized identities (ie, students who reported being male sex, non-Hispanic White race, heterosexual, and not low-income).Then we calculated the scores for students who reported 1 marginalized identity, followed by 2 marginalized identities, 3 marginalized identities, and last, mean tool scores by medical school for students who reported 4 marginalized identities.We hypothesized that the mean medical school tool scores would decrease as students reported additional marginalized identities.
Second, we determined differences in mean tool scores between marginalized and nonmarginalized groups (female vs male; Asian, Black, Hispanic/LatinX vs White; LGB vs non-LGB; and low SES vs nonlow SES).We hypothesized that members of marginalized groups would have lower mean tool scores than their nonmarginalized peers.
Tool scores for individual factors identified through factor analysis were calculated as an average of all items comprising those factors.Because individual items may have originally contained scales of varying ranges, all individual items were standardized to a 5-point scale (range from 1-5).
Overall medical school tool scores were calculated as a sum of all subscale scores to give equal weight across factors and normalized to range from 1-100 scale.
We compared the mean medical school tool scores calculated by the number of students' marginalized identities using ANOVA with multiple comparison with Tukey adjustment.For all other identities, 2-tailed independent t tests were used to determine whether differences in mean scores between nonmarginalized and marginalized groups were statistically significant.Significance was set at a P value of less than .05.

Results
The

Confirmatory Factor Analysis
Results indicated that the 5-factor model for the GQ was appropriate for the data.The RMSEA was 0.06, indicating a good fit.The CFI (0.94) and TLI (0.93) values also indicated acceptable fit, as did the SRMR (0.04).Additionally, results demonstrated the 8-factor model for the Y2Q was appropriate.The RMSEA was 0.05, indicating a very good fit.The CFI (0.95) and the TLI (0.94) also indicated acceptable fit, as did the SRMR of 0.04.All factor loadings were greater than 0.40 for both the 5-factor GQ model and the 8-factor Y2Q model, which is considered acceptable. 29Table 2 provides the individual items of the tool, with corresponding factor loadings from the EFA and CFA.

Internal Consistency
For the GQ PRODIGIE model, Cronbach α for each factor ranged from 0.76 to 0.95 (Table 2).For the Y2Q model, Cronbach α for each factor ranged from 0.69 to 0.92 (Table 2).With the exception of 1 marginal Y2Q factor, Cronbach α for all factors was greater than 0.70, which is in the range considered acceptable. 30

Criterion Validity
Tool scores were calculated for 134 medical schools at the Y2Q time point and 129 schools for the GQ time point.At the GQ time point, the mean (SD) medical school tool score was 82.9 (2.5), with scores ranging from 77.   a EFA column displays the factor loadings of the exploratory factor analysis.CFA column denotes the factor loadings from the confirmatory analysis.A criterion of 0.45 was used as the cutoff for inclusion on a factor.Hispanic (79.94 [10.99]) students reported lower scores than White students (81.28 [9.73]) (η 2 = 0.008; P < .001);scores for LGB students were lower than scores for non-LGB students (78.17 [10.85] vs 80.81 [10.01]; η 2 = 0.005; P < .001);and scores for students from low-income backgrounds were lower than scores from students not from low-income backgrounds (79.91 [10.69]   vs 80.89 [9.83]; η 2 = 0.001; P < .001) in analyzing scores from the Y2Q time point (Table 3).
Similarly, mean (SD) tool scores calculated at the GQ time point were lower for female students compared with male students (81.82 [10.19]

Discussion
Our results demonstrate that the new tool is a reliable and psychometrically valid measure of medical students' perceptions of equity and inclusion in the learning environment.The tool exhibited acceptable internal consistency overall and within individual factors.Consistent with our a priori assumptions and prior literature, 10,12,14,18  standardize tool score calculations, medical schools could partner with the AAMC to obtain schoollevel and national benchmarking data.The tool score reports could be aggregated over 3-year or 5-year time periods to ensure student anonymity, especially for students reporting multiple identities historically marginalized in medicine.
It is important to note that mean tool scores between marginalized and nonmarginalized groups are statistically significant, but differences and their associated effect sizes are small.2][33][34][35] Moreover, the size of differences between groups are similar to results found in other climate surveys used by the AAMC and in prior national studies. 36,37Future studies of the tool will examine how differences in tool scores influence disparities in consequential student outcomes in the learning environment, including attrition, successful placement into graduate medical education, burnout, and the receipt of academic awards.

Limitations
Our study has limitations.Items selected for inclusion in the tool were limited to survey questions currently administered students by the AAMC.Consequently, the tool may not capture all aspects of equity and inclusion in the learning environment.Nevertheless, as the AAMC expands its cadre of data collection instruments examining students' experiences, the tool can be enhanced to capture these phenomena.
Creation of the tool relied on student responses to AAMC surveys.The AAMC's GQ has a historical response rate of approximately 80%, 38 and the AAMC's Y2Q has a response rate of 59%. 39ior studies have shown that students from marginalized backgrounds may be less likely than their peers to complete these surveys. 18Consequently, the tool may not capture the perspective of marginalized students in its entirety.
Historically, students from marginalized groups have higher rates of attrition in medical school than their counterparts. 40Tool results, especially at the GQ time point, may not fully reflect the degree of inequity and exclusion in the medical school learning environment secondary to student survival bias.
Additionally, the tool, in its current iteration, does not reflect the lived experience of medical students from all marginalized backgrounds.In particular, understanding how students reporting disabilities, religious beliefs, and nonbinary gender identities experience the learning environment will be critical to promote equity and inclusion in medicine.

Conclusions
This study demonstrates that this new tool is a psychometrically valid and reliable measure of the climate of equity and inclusion in the medical school learning environment.Medical schools can use the tool to benchmark their students' perception of equity and inclusion in the learning environment and identify areas for improvement to address disparities.
Network Open | Equity, Diversity, and Inclusion Development of a Tool to Measure Student Perceptions of Equity and Inclusion in Medical Schools background (D.B., K.H., D.B., L.C.P., S.S., S.C.S., R.R., A.J., and E.A.).Members included physicians, PhDs in education and organizational psychology, and medical students.

JAMA Network Open | Equity, Diversity, and Inclusion Development
of a Tool to Measure Student Perceptions of Equity and Inclusion in Medical Schools Development of a Tool to Measure Student Perceptions of Equity and Inclusion in Medical Schools Using standard conventions, RMSEA close to 0.06, a CFI and TFI close to 0.95, and SRMR less than 0.08 indicated good model fit. 26JAMA Network Open | Equity, Diversity, and Inclusion JAMA Network Open.2024;7(2):e240001.doi:10.1001/jamanetworkopen.2024.0001(Reprinted) February 21, 2024 4/14 Downloaded from jamanetwork.comby guest on 02/29/2024 Internal Consistency | Internal consistency is an important component of reliability that indicates

Table 1 .
Characteristics of Year 2 Questionnaire and Graduation Questionnaire Cohorts 1 to 91.3.For the Y2Q time point, the mean (SD) medical school tool score was 80.6(2.5), and scores ranged from 74.6 to 88.3.Mean (SD) medical school tool scores were greatest when only including students who reported no identities historically marginalized in medicine (82.3[2.8]forY2Q and 84.4[2.7]for the GQ).As a Low-income is defined as students who either received a Pell grant or state or federal financial assistance.

Table 2 .
Factor Loadings of the Year 2 Questionnaire and Graduation Questionnaire: Exploratory and Confirmatory Analyses a (continued)

Table 3 .
students from historically marginalized backgrounds rated the medical learning environment less favorably than students from more privileged groups, supporting the tool's criterion validity.This new validated tool provides several benefits.It can provide a global assessment of the climate of equity and inclusion at all MD-granting institutions.Insight gained from the tool's global assessment is complemented and bolstered by specific data present in individual tool factors, whichFigure.School-Averaged Overall Promoting Diversity, Group Inclusion, and Equity Tool Scores Across Groups of Multiple Marginalized Identities students' perceptions of important facets of the learning environment, including faculty support of students, student fellowship, and discrimination.The tool provides an assessment of equity and inclusion in the learning environment at 2 critical and distinct time points, which, for most schools, demarcate a transition from a classroom-based to a clinical learning environment.Because the tool can measure differences in the climate of equity and inclusion by student sociodemographic identity, it can quantify the experience of inequity and exclusion in undergraduate medical education, both nationally and at the level of individual Year 2 and Graduation Questionnaire Average Overall Promoting Diversity, Group Inclusion, and Equity Factor Scores Across Students' Sociodemographic Characteristics reflect Because the tool relies on existing surveys administered by the AAMC, it has none of the drawbacks of traditional climate surveys, such as participant survey fatigue.To facilitate and Abbreviation: NA, not applicable.